AITopics | contextual multi-armed bandit

Collaborating Authors

contextual multi-armed bandit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System

Zhao, Zhenyu, Jiang, Yexi

arXiv.org Machine LearningSep-20-2024

Features (a.k.a. context) are critical for contextual multi-armed bandits (MAB) performance. In practice of large scale online system, it is important to select and implement important features for the model: missing important features can led to sub-optimal reward outcome, and including irrelevant features can cause overfitting, poor model interpretability, and implementation cost. However, feature selection methods for conventional machine learning models fail short for contextual MAB use cases, as conventional methods select features correlated with the outcome variable, but not necessarily causing heterogeneuous treatment effect among arms which are truely important for contextual MAB. In this paper, we introduce model-free feature selection methods designed for contexutal MAB problem, based on heterogeneous causal effect contributed by the feature to the reward distribution. Empirical evaluation is conducted based on synthetic data as well as real data from an online experiment for optimizing content cover image in a recommender system. The results show this feature selection method effectively selects the important features that lead to higher contextual MAB reward than unimportant features. Compared with model embedded method, this model-free method has advantage of fast computation speed, ease of implementation, and prune of model mis-specification issues.

causal feature selection method, contextual multi-armed bandit, divergence, (10 more...)

arXiv.org Machine Learning

2409.13888

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
Europe > Italy > Apulia > Bari (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.72)
Information Technology > Data Science > Data Mining > Big Data (0.63)

Add feedback

Transfer Learning for Contextual Multi-armed Bandits

Cai, Changxiao, Cai, T. Tony, Li, Hongzhe

arXiv.org Artificial IntelligenceNov-22-2022

Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected on source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the minimax regret is proposed. The results quantify the contribution of the data from the source domains for learning in the target domain in the context of nonparametric contextual multi-armed bandits. In view of the general impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical guarantees (up to a logarithmic factor) while automatically adapting to the unknown parameters over a large collection of parameter spaces under an additional self-similarity assumption. A simulation study is carried out to illustrate the benefits of utilizing the data from the auxiliary source domains for learning in the target domain.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.12612

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Heyse

AAAI ConferencesFeb-8-2022, 11:08:17 GMT

The World Health Organisation (WHO) states that: "There is no health without mental health". Health population studies show that the most common mental disorders are anxiety disorders. Nowadays, Virtual Reality Exposure Therapy (VRET) is used to help people manage their anxiety. The next step forward, is personalisation of VRET to further improve therapy and patient motivation. The effects of VRET would even be more increased by automating this personalisation by taking background and data from wearables into account. In the ongoing PATRONUS project, we aim at designing such a system that provides truly personalised VRET. In light of this project, this paper discusses the current shortcomings of Contextual Multi-Armed Bandits and related challenges in personalisation. Future research areas are proposed, namely the use of semantics in reinforcement learning and Contextual Multi-Armed Bandits for personalisation as well as clustering patients based on background information in order to train better models.

contextual multi-armed bandit, personalisation, vret, (3 more...)

AAAI Conferences

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Counterfactual Contextual Multi-Armed Bandit: a Real-World Application to Diagnose Apple Diseases

Sottocornola, Gabriele, Stella, Fabio, Zanker, Markus

arXiv.org Artificial IntelligenceFeb-8-2021

Post-harvest diseases of apple are one of the major issues in the economical sector of apple production, causing severe economical losses to producers. Thus, we developed DSSApple, a picture-based decision support system able to help users in the diagnosis of apple diseases. Specifically, this paper addresses the problem of sequentially optimizing for the best diagnosis, leveraging past interactions with the system and their contextual information (i.e. the evidence provided by the users). The problem of learning an online model while optimizing for its outcome is commonly addressed in the literature through a stochastic active learning paradigm - i.e. Contextual Multi-Armed Bandit (CMAB). This methodology interactively updates the decision model considering the success of each past interaction with respect to the context provided in each round. However, this information is very often partial and inadequate to handle such complex decision making problems. On the other hand, human decisions implicitly include unobserved factors (referred in the literature as unobserved confounders) that significantly contribute to the human's final decision. In this paper, we take advantage of the information embedded in the observed human decisions to marginalize confounding factors and improve the capability of the CMAB model to identify the correct diagnosis. Specifically, we propose a Counterfactual Contextual Multi-Armed Bandit, a model based on the causal concept of counterfactual. The proposed model is validated with offline experiments based on data collected through a large user study on the application. The results prove that our model is able to outperform both traditional CMAB algorithms and observed user decisions, in real-world tasks of predicting the correct apple disease.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2102.04214

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(15 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Food & Agriculture > Agriculture (0.48)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards

Arya, Sakshi, Yang, Yuhong

arXiv.org Machine LearningFeb-4-2019

Multi-armed bandits were first introduced in the landmark paper by Robbins (1952). The development of multi-armed bandit methodology has been partly motivated by clinical trials with the aim of balancing two competing goals, 1) to effectively identify the best treatment (exploration) and 2) to treat patients as effectively as possible during the trial (exploitation). The classic formulation of the multi-armed bandit problem in the context of clinical practice is as follows: there are l treatments (arms) to treat a disease. The doctor (decision maker) has to choose for each patient, one of the l available treatments, which result in a reward (response) of improvement in the condition of the patient. The goal is to maximize the cumulated rewardsas much as possible.

bandit problem, covariate, delay distribution, (15 more...)

arXiv.org Machine Learning

1902.00819

Country:

North America > United States > Minnesota (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Contextual Multi-Armed Bandits for Causal Marketing

Sawant, Neela, Namballa, Chitti Babu, Sadagopan, Narayanan, Nassif, Houssam

arXiv.org Machine LearningOct-2-2018

This work explores the idea of a causal contextual multi-armed bandit approach to automated marketing, where we estimate and optimize the causal (incremental) effects. Focusing on causal effect leads to better return on investment (ROI) by targeting only the persuadable customers who wouldn't have taken the action organically. Our approach draws on strengths of causal inference, uplift modeling, and multi-armed bandits. It optimizes on causal treatment effects rather than pure outcome, and incorporates counterfactual generation within data collection. Following uplift modeling results, we optimize over the incremental business metric. Multi-armed bandit methods allow us to scale to multiple treatments and to perform off-policy policy evaluation on logged data. The Thompson sampling strategy in particular enables exploration of treatments on similar customer contexts and materialization of counterfactual outcomes. Preliminary offline experiments on a retail Fashion marketing dataset show merits of our proposal.

customer, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1810.01859

Country:

Europe (0.68)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.68)

Industry: Marketing (0.47)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Graph-Based Recommendation System

Yang, Kaige, Toni, Laura

arXiv.org Machine LearningJul-31-2018

In this work, we study recommendation systems modelled as contextual multi-armed bandit (MAB) problems. We propose a graph-based recommendation system that learns and exploits the geometry of the user space to create meaningful clusters in the user domain. This reduces the dimensionality of the recommendation problem while preserving the accuracy of MAB. We then study the effect of graph sparsity and clusters size on the MAB performance and provide exhaustive simulation results both in synthetic and in real-case datasets. Simulation results show improvements with respect to state-of-the-art MAB algorithms.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1808.00004

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Deep Contextual Multi-armed Bandits

Collier, Mark, Llorens, Hector Urdiales

arXiv.org Machine LearningJul-25-2018

Contextual multi-armed bandit problems arise frequently in important industrial applications. Existing solutions model the context either linearly, which enables uncertainty driven (principled) exploration, or non-linearly, by using epsilon-greedy exploration policies. Here we present a deep learning framework for contextual multi-armed bandits that is both non-linear and enables principled exploration at the same time. We tackle the exploration vs. exploitation trade-off through Thompson sampling by exploiting the connection between inference time dropout and sampling from the posterior over the weights of a Bayesian neural network. In order to adjust the level of exploration automatically as more data is made available to the model, the dropout rate is learned rather than considered a hyperparameter. We demonstrate that our approach substantially reduces regret on two tasks (the UCI Mushroom task and the Casino Parity task) when compared to 1) non-contextual bandits, 2) epsilon-greedy deep contextual bandits, and 3) fixed dropout rate deep contextual bandits. Our approach is currently being applied to marketing optimization problems at HubSpot.

bandit, data mining, machine learning, (22 more...)

arXiv.org Machine Learning

1807.09809

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback